GraphX: Graph Processing in a Distributed Dataflow Framework

نویسندگان

  • Joseph Gonzalez
  • Reynold Xin
  • Ankur Dave
  • Daniel Crankshaw
  • Michael J. Franklin
  • Ion Stoica
چکیده

In pursuit of graph processing performance, the systems community has largely abandoned general-purpose distributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming abstractions and accelerate the execution of iterative graph algorithms. In this paper we argue that many of the advantages of specialized graph processing systems can be recovered in a modern general-purpose distributed dataflow system. We introduce GraphX, an embedded graph processing framework built on top of Apache Spark, a widely used distributed dataflow system. GraphX presents a familiar composable graph abstraction that is sufficient to express existing graph APIs, yet can be implemented using only a few basic dataflow operators (e.g., join, map, group-by). To achieve performance parity with specialized graph systems, GraphX recasts graph-specific optimizations as distributed join optimizations and materialized view maintenance. By leveraging advances in distributed dataflow frameworks, GraphX brings low-cost fault tolerance to graph processing. We evaluate GraphX on real workloads and demonstrate that GraphX achieves an order of magnitude performance gain over the base dataflow framework and matches the performance of specialized graph processing systems while enabling a wider range of computation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Innovative Approach to RDF Graph Data Processing : Spark with GraphX and Scala

The number of linked data sources and the size of the linked open data graph keep growing every day. As a consequence, semantic RDF services are more and more confronted to various “big data” problems. Query processing is one of them and needs to be efficiently addressed with executions over scalable, highly available and fault tolerant frameworks. Data management systems requiring these proper...

متن کامل

GraphX: Unifying Data-Parallel and Graph-Parallel Analytics

From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster ...

متن کامل

SPARQL over GraphX

The ability of the RDF data model to link data from heterogeneous domains has led to an explosive growth of RDF data. So, evaluating SPARQL queries over large RDF data has been crucial for the semantic web community. However, due to the graph nature of RDF data, evaluating SPARQL queries in relational databases and common data-parallel systems needs a lot of joins and is inefficient. On the oth...

متن کامل

S2X: Graph-Parallel Querying of RDF with GraphX

RDF has constantly gained attention for data publishing due to its flexible data model, raising the need for distributed querying. However, existing approaches using general-purpose cluster frameworks employ a record-oriented perception of RDF ignoring its inherent graph-like structure. Recently, GraphX was published as a graph abstraction on top of Spark, an in-memory cluster computing system....

متن کامل

An Efficient Approach to Large-Scale Biomedical Graph Datasets for Pathway Analysis and Finding Most Interacting Proteins towards Treatment Efficacy

Integration of complementary functional annotations is an important step in understanding the gene-disease association and the mechanisms underlying complex diseases. The increasing amount of biomedical datasets and their availability drive the demand for parallel and distributed computing, which then imposes a need for scalable and high throughput algorithms and tools. In this paper, we propos...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014